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Transcription initiation is the central 
point of gene expression regulation. 
Understanding of molecular mechanism 
of transcription regulation requires, 
ultimately, the structural understanding 
of consequences of transcription factors 
binding to DNA-dependent RNA 
polymerase (RNAP), the enzyme of 
transcription. We recently determined 
a structure of a complex between 
transcription factor gp39 encoded by a 
Thermus bacteriophage and Thermus 
RNAP holoenzyme. In this addendum 
to the original publication, we highlight 
structural insights that explain the ability 
of gp39 to act as an RNAP specificity 
switch which inhibits transcription 
initiation from a major class of bacterial 
promoters, while allowing transcription 
from a minor promoter class to continue. 

The first structure of a multisubunit 
DNA-dependent RNA polymerase 
(RNAP) core enzyme, from thermophilic 
bacterium Thermus aquaticus, was 
published in 1999 1 and heralded a 
new era in the studies of transcription 
mechanism. The structures of Thermus 
RNAP holoenzyme, 2,3 its promoter and 
transcription elongation complexes, 4 " 7 and 
complexes with antibiotics, substrates, and 
low-molecular weight regulators followed 
shortly. 8 " 12 Together, these structures 
provided a wealth of information about the 
mechanism of transcription but offered 
relatively less insight into the mechanism 
of transcription regulation. 



Historically, phages provided some 
of the classical examples of transcription 
regulation and paradigms revealed 
during such studies are widely applicable 
to transcription regulation in other 
systems including higher organisms. 
The advantage of phages as a source of 
transcription regulators stems from the fact 
that all phages orchestrate expression of 
several temporal classes of their own genes 
and many phages shut down transcription 
of their hosts. This has to be achieved in a 
short period of time, which explains why 
many transcription regulators encoded 
by phages are very robust: they bind 
RNAP tightly and the consequences of 
their binding are generally strong. The 
best understood phage transcription 
regulators come, not surprisingly, from 
classical Escherichia coli phages such as X, 
T4, and T7. Despite the wealth of genetic 
and biochemical data on the function of 
E. coli phage transcription factors, the 
structural understanding is lacking. The 
breakthroughs with Thermus RNAP 
structures were of little help, since known 
E. coli phage transcription regulators 
do not bind to, and therefore do not 
regulate the activity of the crystallizable 
Thermus enzyme. One strategy under 
the circumstances was to put effort into 
obtaining diffracting crystals of E. coli 
RNAP, first alone and then in complex 
with known phage-encoded transcription 
factors. Such a strategy bore fruit in late 
2013 when a structure of the E. coli cr 70 
RNAP holoenzyme complexed with 
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Figure I.The structure of the Thermus thermophilus a* holoenzyme complex with P23-45 phage gp39 protein. (A) The overall structure of the complex is 
shown on the left. The RNAP core is shown in gray except for the p-f lap shown in golden color. The 0* subunit DNA binding domains 2 and 4 are shown in 
magenta; gp39 is shown in blue. On the righthand-side of the panel, the part of the structure containing gp39 is expanded with elements of gp39 struc- 
ture discussed in the text (the core and the C-terminal tail) highlighted in, respectively, blue and red. The exit point of nascent RNA from the complex is 
shown as a green oval. (B) A close-up view of gp39-RNAP interaction site. See text for details. (C) A superposition of the p-flap-(J 4 areas in the free RNAP 
holoenzyme (gray) and in the gp39-RNAP holoenzyme structures. The arrow indicates the extent of gp39-induced rotation of DNA-binding helix of cr 4 . 



bacteriophage T7-encoded transcription 
initiation inhibitor gp2 was published. 13 
The structure confirmed many of the 
inferences obtained by earlier functional 
analyses (i.e., the location of inhibitor 
binding site, the main mode of promoter 
binding inhibition, etc. 14 " 16 ) and also 
provided non-trivial insights into the 
finer details of the inhibition. The second 
strategy was to try to isolate phages 
infecting Thermus, identify transcription 
factors encoded by these phages and 
characterize them both functionally and 
structurally. This line of research came 
as a close second to the "E. co/z'-centric" 
approach with a structure of the Thermus 
thermophilus cr^ RNAP holoenzyme 
complexed with bacteriophage P23-45 
dual function transcription regulator 
gp39. 17 



Almost ten years ago, a paper 
describing a collection of diverse phages 
infecting bacteria of the Thermus genome 
was published. 18 Initially, Thermus YS40 
phage, a myophage the size of the E. 
coli T4 phage known to be obsessive- 
compulsive about transcription regulation 
of its genes, 19 was deemed to be the best 
source of Thermus RNAP interacting 
regulators. However, its analysis yielded 
no transcription factors. 20,21 Despite this 
setback, the search for Thermus phage- 
encoded transcription factors continued 
with P23-45, a phage isolated from a 
Kamchatka Geyser valley hot spring. 
P23-45 is a siphovirus with an extremely 
(1 micron) long tail and a -80 kbp double- 
stranded DNA genome. The early genes of 
P23-45 are likely transcribed by a highly 
unusual RNAP encoded by the phage 



genome 22 while a very long late gene 
cluster containing viral structural gene 
(39 genes constituting more that 45 kbp 
of DNA, including the 15,009 bp-long 
tape-measure protein gene, responsible 
for the unusually long tail of the virus) 
is transcribed by host RNAP from a late 
promoter that belongs to the extended 
-10 bacterial promoter class. Unlike 
the major -10/-35 bacterial RNAP 
promoters, extended -10 class promoters 
activity depends on RNAP interaction, 
through cr subunit region 2 (rjj, with 
just one promoter consensus element. The 
second interaction, between a subunit 
region 4 (cr 4 ) and the -35 promoter 
element is not required for recognition of 
promoters of this class. A 141 aminoacid- 
long P23-45 protein gp39 was identified 
as a component of host RNAP affinity 
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Figure 2. A structural model of gp39-controlled RNAP promoter specificity switch. On the left, the interaction of RNAP holoenzyme with a -10/-35 
promoter is shown. Notice that the distance between the DNA binding regions of tr 2 and a 4 in free holoenzyme (top) matches the distance between 
the -1 0 and -35 promoter consensus elements (below), allowing promoter complex formation. The binding of gp39 (middle) decreases the distance 
between <r 2 and tr 4 making productive interaction with -10/-35 promoters impossible. Promoter complex formation with extended -10 promoters can 
still occur (right). 



purified from P23-45-infected Thermus 
cultures. 22 Recombinant gp39 was 
shown to strongly inhibit transcription 
from -10/-35 promoters but not from 
extended -10 promoters. 22 Therefore, 
gp39 could execute an RNAP specificity 
switch to direct host RNAP from host 
genes transcription to phage late gene 
transcription. 

The structure of gp39 complex with 
the cr^ RNAP holoenzyme 17 is shown in 
Figure 1A. The gp39 structure consists 
of 1) a core (residues 1-107) comprising 
the central (3 sheet and the N-terminal 
ct helix (indicated by blue color in 
Figures 1A-C), and 2) the C-terminal 
tail (residues 118-141) (indicated in 
red); these two parts are connected by 
a linker that is mostly disordered. The 
primary site of gp39 binding is the 
RNAP (3 subunit flap domain (Fig. IB). 
The gp39 core binds primarily to flap 
protuberance (|3 subunit residues 723- 
740) that protrudes from a three-strand 
(3 sheet that forms the main part of the 
(3-flap (Fig. IB). The interaction relies 
on shape complementarity and extensive 



hydrophobic/hydrophilic interactions. 
Disruption of these interactions by 
point mutations prevents gp39-RNAP 
interaction. 17 The secondary site of 
interaction is afforded by gp39 a-helix of 
C-terminal tail, which is fixed through 
interactions with gp39 core, the tip of the 
(3-flap, and cr,. 

In the holo*gp39 complex structure, 
the C-terminal helix of gp39 displaces an 
a-helix of o" 4 that normally interacts with 
the (3-flap tip (Fig. 1C). Consequently, 
ct 4 and the (3-flap tip are rotated by -60° 
relative to the rest of the flap domain. The 
relative positions of ct 4 and the (3-flap tip 
are not changed during this motion but, as 
the end result, the position of o~ 4 is shifted 
by '45 A. As a result of gp39-induced 
rotation the simultaneous interaction of 
cr 2 and o~ 4 with, respectively, -10 and -35 
promoter consensus elements, becomes 
impossible (Fig. 2). In contrast, the 
holo*gp39 complex structure is compatible 
with promoter complex formation on 
extended -10 type promoters (Fig. 2). 
Thus, the new structure explains, at 
the molecular level, the mechanism of 



promoter-specific inhibition of promoter 
binding by gp39. A gp39 mutant lacking 
the C-terminal tail binds RNAP normally 
but is severely defective in transcription 
initiation inhibition, underscoring the 
importance of the secondary interaction 
between gp39 C-terminal helix and cr 4 for 
transcription regulation mechanism. 

Thermus phages related to P23-45 
appear to be globally distributed around 
the world. We recently isolated one such 
phage, named phiFa, from around mount 
Etna and determined its genome sequence 
(Fig. 3A). While standard bioinformatics 
analysis failed to identify a homolog of 
gp39 in phiFA, manual analysis revealed 
that a product of phiFa gene 15 contains 
counterparts of gp39 residues important 
for primary interaction with the (3-flap 
protuberance (Fig. 3B). No similarity to 
gp39 outside this area could be detected, 
however. The phiFa gene 15 is located 
at the boundary of the early and middle 
genes clusters, i.e., in the same position as 
gene 39 in P23-45 (Fig. 3A). Like gp39, 
gpl5 inhibits transcription initiation 
by Thermus RNAP from -10/-35 class 
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Figure 3. A distant gp39 homolog encoded by the phiFA Thermus phage. (A) Schematic comparison of P23-45 and phiFA phage genomes. Genes 
belonging to different temporal expression classes are color-coded. Arrows indicate the direction of transcription. Homologous genes in two genomes 
are connected by thin punctuated lines. A P23-45 gene coding for gp39, at the boundary of the early and middle gene clusters, is indicated. A rightward 
arrow indicates the P23-45 late promoter. Two validated transcription terminators in the late gene cluster are indicated. (B) Sequence alignment of P23- 
45 gp39 and phiFA gp1 5 (single aminoacid code). Dots indicate identities, hyphens - gaps. The regions of gp39 that participate in the primary interaction 
with the RNAP core (through the (3-flap protuberance) and that form the C-terminal helix are highlighted in blue and red, respectively. Individual 
amino acids involved in interactions with the pi-flap and cr 4 are indicated by a correspondingly colored background in the alignment. (C) The results of 
abortive in vitro transcription by Thermus RNAP cr* holoenzyme from a -10/-35 class T7 A1 promoter in the absence, or in the presence of increasing 
concentrations of P23-45 gp39 or phiFA gp15 are shown. The transcript CpApU is indicated by a horizontal arrow. An autoradiogram of a denaturing 
polyacrylamide gel is shown. A strong radioactive band present at the bottom of each lane is unincorporated 32 P labeled UTP substrate. 



promoters (Fig. 3C). While the primary 
RNAP binding site of gp39 and gpl5 is 
likely identical, the phiFa protein does not 
have residues matching P23-45 gp39 amino 
acids forming the C-terminal tail helix 
that packs against cr 4 and its C-terminal 
tail is considerably longer than that of 
gp39 (Fig. 3B). Thus, the mechanism of 
transcription initiation inhibition may be 
different for different gp39-like proteins. 
Structure determination of gpl5, alone 
and in complex with Thermus RNAP 
holoenzyme, will allow us to test this 
conjecture in the near future. 

The CT 4 , (3-flap interface emerges as a 
critical point accepting regulatory inputs 
from the "upstream" side of RNAP. The 
case of E. coli bacteriophage T4 AsiA 
protein provides another example. AsiA is 
responsible for a switch from host and early 
phage promoters (belong to the -10/-35 
class) to middle phage promoters (belong 
to the extended -10 class). Unlike gp39, 
AsiA binds to cr 70 region 4. While the 
structural information is lacking, available 



biochemical and biophysical data suggest 
that the AsiA binding disrupts the cr,»P- 
flap interaction and also likely reorients 
both the flap and cr 4 , thus achieving the 
same effect as gp39. Unlike AsiA, 23 gp39 
is also a transcription elongation factor, 
strongly antiterminating transcription 
by Thermus RNAP on all intrinsic 
transcription terminators tested. 24 
This function is likely biologically 
relevant, allowing gp39 help host RNAP 
transcribe through a very long cluster 
of late phage genes separated by several 
terminators (Fig. 3A). The transcription 
antitermination function of gp39 must 
be cr-independent, as cr dissociates from 
RNAP core upon promoter escape. Indeed, 
the gp39 mutant lacking the C-terminal 
helix needed for transcription initiation 
inhibition mentioned above is fully 
capable of antiterminating transcription. 
The location of gp39 primary binding site, 
close to the nascent RNA exit pathway 
(Fig. 1A), suggests that gp39 may affect 
transcription termination properties by 



altering RNAP interactions with exiting 
RNA. Interestingly, phage X. transcription 
antitermination protein Q also may affect 
transcription termination properties of E, 
coli RNAP through the flap domain. 25,26 
Irrespective of which — E. coli RNAP»\ 
Q or Thermus RNAP»P23-45 gp39 
elongation complex — structure is 
determined first, comparative analyses of 
complexes between E. coli and Thermus 
phage-encoded transcription regulators 
with cognate RNAPs will continue 
to provide insights into molecular 
mechanisms of transcription regulators for 
years to come. 
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